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We show that the entropy of a message can be tested in a device-independent way. Specifically, we 
consider a prepare-and-measure scenario with classical or quantum communication, and develop 
two different methods for placing lower bounds on the communication entropy, given observable 
data. The first method is based on the framework of causal inference networks. The second tech¬ 
nique, based on convex optimization, shows that quantum communication provides an advantage 
over classical, in the sense of requiring a lower entropy to reproduce given data. These ideas may 
serve as a basis for novel applications in device-independent quantum information processing. 
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The development of device-independent (DI) quan¬ 
tum information processing has attracted growing at¬ 
tention recently. The main idea behind this new 
paradigm is to achieve quantum information tasks, and 
guarantee their secure implementation, based on ob¬ 
served data alone. Thus no assumption about the in¬ 
ternal working of the devices used in the protocol is 
in principle required. Notably, realistic protocols for 
DI quantum cryptography |Tj and randomness genera¬ 
tion E0 were presented, with proof-of-concept exper¬ 
iments for the second M- 

The strong security of DI protocols finds its origin 
in a more fundamental aspect of physics, namely the 
fact that certain physical quantities admit a model- 
independent description and can thus be certified in a 
DI way. The most striking example is Bell nonlocal¬ 
ity EEL which can be certified (via Bell inequality vi¬ 
olation) by observing strong correlations between the 
results of distant measurements. Notably, this is possi¬ 
ble in quantum theory, by performing well-chosen local 
measurements on distant entangled particles. More re¬ 
cently, it was shown that the dimension of an uncharac¬ 
terized physical system (loosely speaking, the number 
of relevant degrees of freedom) can also be tested in a 
DI way ffilol. Conceptually, this allows us to study 
quantum theory inside a larger framework of physi¬ 
cal theories , which already brought insight to quan¬ 
tum foundations IHJfMj- From a more applied point 
of view, this allows for DI protocols and for black-box 
characterization of quantum systems [ fT^jlSol . 

In this context, it is natural to ask whether there exist 
other physical quantities which admit a DI characteri¬ 
zation. Here we show that this is the case by demon¬ 
strating that the entropy of a message can be tested in 
a DI way. Specifically, we present simple and efficient 
methods for placing lower bounds on the entropy of 
a classical (or quantum) communication based on ob¬ 
servable data alone. We construct such "entropy wit¬ 
nesses" following two different approaches, first using 
the framework of causal inference networks I21I . and 



FIG. 1. Prepare-and-measure scenario, (a) Black-boxes rep¬ 
resentation. (b) Representation as a DAG. (c) Finer descrip¬ 
tion of the prepare-and-measure scenario where the number 
of measurements is explicitly taken in to account. 


second using convex optimization techniques. The first 
construction is very general, but usually gives subopti- 
mal bounds. The second construction allows us to place 
tight bounds on the entropy of classical messages for 
given data. Moreover, it shows that quantum systems 
provide an advantage over classical ones, in the sense 
that they typically require lower entropy to reproduce 
a given set of data. 

Scenario .—We consider the prepare-and-measure sce¬ 
nario depicted in Fig. [lja). It features two uncharac¬ 
terized devices, hence represented by black-boxes: a 
preparation and a measurement device. Upon receiving 
input x (chosen among n possible settings), the prepa¬ 
ration device sends a physical system to the measuring 
device. The state of the system may contain informa¬ 
tion about x. Upon receiving input y (chosen among l 
settings) and the physical system sent by the prepara¬ 
tion device, the measuring device provides an outcome 
b (with k possible values). The experiment is thus fully 
characterized by the probability distribution p(b\x,y). 
The inputs x, y are chosen by the observer, from a dis¬ 
tribution p{x,y), which will be taken here to be uniform 
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and independent, i.e. p(x) — 1/n and p(y) — l/l (un¬ 
less stated otherwise). A set of data p(b\x,y) will also 
be represented using the vector notation p; the nlk com¬ 
ponents of p giving the probabilities p(b\x,y). 

Our main focus is the entropy of the mediating phys¬ 
ical system, and our main goal will be to lower bound 
this entropy in a DI way, that is, based only on the ob¬ 
servational data p. We will consider both cases in which 
the mediating physical system is classical and quantum. 

Let us first consider the quantum case. For each in¬ 
put x, the preparation device sends a quantum state 
Q x (in a Hilbert space of finite dimension d). We are 
interested in the von Neumann entropy of the average 
emitted state 

S(q) = -fr(qlogq) where p = J^p(x)p x . (1) 

X 

Specifically we want to find the minimal S(q) that is 
compatible with a given set of data, i.e. such that there 
exist states Q x and measurement operators M b | (acting 
on C rf ) such that p(b\x,y) — tr{q x M b \ y ). Note that in 
general we want to minimize S(p) without any restric¬ 
tion on the dimension d. 

In the case of classical systems, for each input x, 
a message m G {0, ...,d — 1} is sent with probability 
p(m\x). The average message M is given by the distri¬ 
bution p(m) — Y,x p{ m \ x )p( x ), with Shannon entropy 

d -1 

H{M) = - £ P( m ) log p(m). (2) 

m —0 

Again, for a given set of data, our goal is to find the 
minimal entropy compatible with the data, considering 
systems of arbitrary dimension d. 

Entropy vs dimension. — Since our goal is to derive DI 
bounds on the entropy without restricting the dimen¬ 
sion our work is complementary to that of Gallego et 
al. liTol , where DI bounds on the dimension were de¬ 
rived. While the work of Ref. [[Toi derived DI lower 
bounds on worst case communication, our goal is to 
place DI lower bounds on the average communication. 

More formally. Ref. [[Toi presented so-called (linear) 
dimension witnesses, of the form 

V(p) = v-p = Y v X y b p(b\x,y) < L d , (3) 

x,y,b 

with (well-chosen) real coefficients v xyb and bound L d . 
The inequality holds for any possible data generated 
with systems of dimension (at most) d. Hence if a given 
set of data p is found to violate a dimension witness, 
i.e. V(p) > L dl then this certifies the use of systems of 
dimension at least d + 1. 

In this work, we look for entropy witnesses, that is, 
functions W which can be evaluated directly from the 


data p with the following properties. First, for any p 
requiring a limited entropy, say H < Hq, we have that 

W(p) < L(H 0 ). ( 4 ) 

Moreover, there should exist (at least) one set of data 
po such that W(po) > L(Hq), thus requiring entropy 
H > Hq. The problem is defined similarly for quantum 
systems, replacing the Shannon entropy with the von 
Neumann entropy. 

Before discussing methods for constructing entropy 
witness, it is instructive to see that DI tests of en¬ 
tropy and dimension are in general completely differ¬ 
ent. Specifically, we show via a simple example, that 
certain sets of data may require the use of systems of 
arbitrarily large dimension d , but vanishing entropy. 

Consider a prepare-and-measure scenario, and a 
strategy using classical systems of dimension d + 1 . We 
consider n — d 2 choices of preparations, and l — n — 1 
choices of measurements, each with binary outcome 
b — ± 1 . Upon receiving input x < d, send message 
m = x; otherwise, send m = 0 . The entropy of the aver¬ 
age message (with uniform choice of x) is found to be 
H(M ) = ( 2 /d)log(d) — (1 — l/d)log(l — 1 /d) which 
tends to zero when n —t oo (and hence d —t oo). How¬ 
ever, the corresponding set of data, po, cannot be repro¬ 
duced using classical systems of dimension d. This can 
be checked using a class of dimension witnesses BToll : 

n—1 n n+1—x 

In( p) — Y + Y Xj v xyE X y U ( 5 ) 

y=1 x=2 y=1 

where E xy — Eb=±i b p(b\x,y) and v xy = 1 if x + y < 
n and — 1 otherwise. For the above strategy, we obtain 
f„(po) > Ld = n(n — 3)/2 + 2 d — 1 . Therefore, the data 
po requires dimension at least d + 1 which diverges as 
n —> 00, but has vanishingly small entropy in this limit. 

Entropy Witnesses I. —The above example shows that 
testing entropy or dimension are distinct problems. 
Thus new methods are required for constructing DI en¬ 
tropy witnesses. We first discuss a construction based 
on the entropic approach to causal inference [[2 tT} 23) . 
To the prepare-and-measure scenario of Fig. la, we as¬ 
sociate a directed acyclic graph (DAG) depicted in Fig. 
lb. Each node of the graph represents a variable of the 
problem (inputs X, Y, output B, and message M), and 
the arrows indicate causal influence. Moreover, we al¬ 
low the devices to act according to a common strategy, 
represented with an additional variable A (taking val¬ 
ues A, with distribution p(A)). We thus have that 

p(b\x,y) = Y, v(b\y,m,X)p(m\x,\)p(X). (6) 

A ,m 

The key idea behind the entropic approach is the 
fact that the causal relationships of a given DAG are 
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faithfully captured by linear equations in terms of en¬ 
tropies I23J. These relations, together with the so-called 
Shannon-type inequalities (valid for a collection of vari¬ 
ables, regardless of any underlying causal structure), 
define a convex set (the entropic cone) which character¬ 
izes all the entropies compatible with a given causal 
structure. Note that for the quantum case, a sim¬ 
ilar analysis can be pursued, with the only notable 
difference that causal relations of the form |6j must 
be replaced with data-processing inequalities; see Ap¬ 
pendix [A] and Refs. 822! [23] for more details. 

Using the methods of ll22l [23J, we characterized the 
facets of the entropic cone for the DAG of Fig. [ijb). In 
the quantum case, the only non-trivial facet is given by 

I(X:Y,B)<S(q), ( 7 ) 

where I(X : Y ) = H(X) + H(Y) - H(X,Y) is the mu¬ 
tual information. Note that for the classical case, the 
Shannon entropy H(M) replaces S(p). The above in¬ 
equality, which in fact follows directly from Holevo's 
bound rsj, provides a simple and general bound 
for the entropy for given data, valid for an arbitrary 
number of preparations, measurements, and outcomes. 
However, this comes at the price of a very coarse¬ 
grained description of the data, and therefore will typ¬ 
ically provide a poor lower bound on the entropy. 

It is possible to obtain a finer description by account¬ 
ing explicitly for the fact that the number of measure¬ 
ments l is fixed. To do so, we replace the variables Y, B 
with 1 new variables By, and split the variable X into 
l separate variables X = (X\,.. ., X;); considering here 
n — r 1 for some integer r {25 J. 

We first discuss the case of / — 2 measurements. The 
corresponding DAG is illustrated in Fig.Jijc). Applying 
again the methods of Ref. [23], we find a single non¬ 
trivial inequality (up to symmetries) 

t(X 1 :B 1 ) + t(X 2 :B 2 ) 

+ I(X 1 :X 2 |B 1 )-I(X 1 :X 2 )<S(e). ^ 

A general class of entropy witnesses can be obtained by 
extending the above inequality to the case of I measure¬ 
ments (details in Appendix | A 4) : 

1 l 

£f(X /: B ; ) + Ef(Xi : Xi|B f ) 

i =1 i =2 , s 

, (9) 

-E»(X i) + H(X,. X,)<S(<>). 

1 = 1 

These witnesses give relevant (although usually subop- 
timal) bounds on S(p). For instance, we show in Ap¬ 
pendix [B] that the maximal violation of the dimension 
witnesses I n (p) (given in Q), which implies the use of 
systems of dimension d — n liidl . also implies maximal 
entropy, i.e. S(q) > log«. 


We note that similar entropy witnesses can be derived 
for the case of classical communication. In fact, it suf¬ 
fices to replace S(q) with H(M ) in (JHJ and (J9J. Note that 
is reminiscent of the principle of information causal¬ 
ity |2J/ but considering here a prepare-and-measure 
scenario IE4} ESI- That is, we consider classical corre¬ 
lations and quantum communication rather than quan¬ 
tum correlations and classical communication. There¬ 
fore, these witnesses cannot distinguish classical from 
quantum systems. More specifically, given a set of data, 
the classical and quantum bounds on the entropy will 
be the same, although this may not be the case in gen¬ 
eral, as we will see below. 

To summarize, the entropic approach allows us to de¬ 
rive compact and versatile entropy witnesses, for sce¬ 
narios involving an arbitrary number of preparations, 
measurements and outcomes. Moreover, the bounds 
obtained on the entropy are valid for systems of arbi¬ 
trary dimension. Nevertheless, this approach has an 
important drawback, namely that the bounds we ob¬ 
tain will typically underestimate the minimum entropy 
actually required to produce a given set of data. The 
reason for this is that in general there exist many dif¬ 
ferent sets of data giving rise to the same value of the 
witness [27] , e.g. the LHS of (J9J. The entropy bound 
will thus correspond to the lowest possible value S(q) 
among these sets of data. This motivates us to inves¬ 
tigate a different approach, which better exploits the 
structure of the data. We also note that for the witnesses 
above, we obtain the same entropy bound for classical 
and quantum systems. In the following, we will be able 
to distinguish them. 

Entropy witnesses II .— We will now discuss a method 
for placing bounds on the entropy using the entire set of 
data p. This method can then be simplified to make use 
of only linear functions of the probabilities p(b\x,y); in 
this case, we shall see that entropy witnesses can be di¬ 
rectly constructed from dimension witnesses. This will 
allow us to show that, in the DI setting, quantum sys¬ 
tems can outperform classical ones in terms of entropy. 

Consider the case of classical communication. At first 
sight, one of the main difficulties is that we need to 
consider strategies involving messages of arbitrary di¬ 
mension. However, notice that in the case of a finite 
number n of preparations, we can focus on messages 
of dimension d < n without loss of generality (see Ap¬ 
pendix [Cj. It then follows that we have a finite num¬ 
ber D of deterministic strategies labeled by A. For each 
strategy, the message m is given by a deterministic func¬ 
tion, g\(x), and the output b is given by a determin¬ 
istic function f\(y,m). Then, any set of data can be 
decomposed as convex combination over the determin¬ 
istic strategies. More formally, we thus write p = Aq, 
where q is a D-dimensional vector with components 
cj\ — p( A) representing the probability to use strategy 
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FIG. 2. Minimum values of H(M ) and S(q) compatible with a 
given value of witnesses I3 or I4. Curves for classical (dotted) 
and quantum (solid) strategies are shown. The use of quan¬ 
tum strategies allow for a significant reduction in the commu¬ 
nication entropy. 


A, and — 1 - The matrix A, of size nlk x D, has 

elements A ( X yb),A ^b,fx(y,m)^m,gx(x)‘ 

The problem can thus be expressed as follows 

minff(M) s.t. Aq = > 0 and = 1 . (10) 

A 

where the minimization is taken over all possible con¬ 
vex combinations of deterministic strategies that repro¬ 
duce p. Notice that this set of possible convex decom¬ 
positions of p forms a polytope Q (in the space of q). 
Thus, although the objective function H(M) is not lin¬ 
ear in q, this problem can be addressed by noting that 
H(M) is concave in q. It follows that the minimum of 
H(M) will be obtained for one of the vertices of Q. 

The above procedure is analytical, and can therefore 
be applied for any given p, in principle. However, it is 
computationally too demanding, even in the simplest 
cases, mainly due to the characterization of the poly¬ 
tope Q. We thus further simplify the problem. First, 
we consider specific linear functions of the data V(p) 
(instead of the entire data p). The first condition in ( 7 o| 
thus becomes V (Aq) = V (p). Moreover, we notice that 
this condition implies constraints on the distribution of 
the message p(m), which can be characterized via a fi¬ 
nite number of linear programs (see Appendix [D] for 
details). 

We apply this method to the linear dimension wit¬ 
nesses I n (p) and illustrate it for n — 3,4 (in Ap¬ 
pendix [p] we also discuss the 2 —>■ 1 random access 
code). For each value of the witness, we obtain the 


minimum on the entropy ff(M) compatible with it. 
The result is shown in Fig. [p and clearly shows that 
min ft (M) is a non-trivial function of I„. However, as 
we show next, min ft (M) can be achieved with a very 
simple strategy. Consider that the value of f„ lies in 
the range L d _i < I n < Lj, that is, requires the use of 
d-dimensional states. Upon receiving input x < d — 1 , 
send message m — x; if x — d, send m — d — 1 with 
probability p — (Lj — I,,)/2 , and send m—d with prob¬ 
ability (1 — p); otherwise send m — 0 . The entropy of 
the average message is then 


H(M) — (d — 2 ) logfl — a log a — jS log j6, (11) 


where a = (1 — p)/n and (5 — 1 — a. — (d — 2)/n and 
which coincides (up to numerical precision) to the ana¬ 
lytical bound for min ft(M) for I3, I4, and 1 $. Interest¬ 
ingly, this result shows that min ft (M) requires only 
messages of minimal dimension; that is, for a given 
value of the witness Trf-i < In (p) < L d , systems of 
dimension d are enough to achieve the lowest possible 
entropy. Another interesting feature is that no shared 
correlations between the preparation and measurement 
devices are needed. We also notice that, perhaps sur¬ 
prisingly, (11) turns out to provide optimal entropy for 
all dimension witnesses that we have tested (see Ap¬ 
pendix [p] further details). Whether this strategy is op¬ 
timal for any dimension witness is an interesting open 
question. We highlight, nonetheless, that even if (ill 
does not hold in general, it still provides a non-trivial 
upper bound on min ft (M). 

A relevant question is now to see if the use of quan¬ 
tum communication may help reducing the entropy. 
That is, for a given witness value, we ask what is the 
lowest possible entropy achievable using quantum sys¬ 
tems. This is in general a difficult question, as we have 
no guarantee that using low-dimensional systems is op¬ 
timal. Nevertheless, we can obtain upper bounds on 
S(q) by considering low dimensional systems. We per¬ 
formed numerical optimization for quantum strategies 
involving systems up to dimension d — 4 (see Ap¬ 
pendix |FJ. Results are presented in Fig. [2] Interestingly, 
the use of quantum systems allows for a clear reduc¬ 
tion of the entropy (compared to classical messages) for 
basically any witness value. Whether the use of higher 
dimensional systems could help reduce S(q) further is 
an interesting question. 

Discussion .—We have shown that the entropy of com¬ 
munication can be tested in a DI way, and presented 
two complementary methods tailored for this task. Our 
methods work for both classical and quantum commu¬ 
nication, and the second method can be used to dis¬ 
tinguish between classical and quantum systems for a 
given bound on the entropy. 

Given the success of the DI approach for quantum in¬ 
formation processing, it would be interesting to inves- 
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tigate potential applications based on the present work. 
While DI tests of dimension led to partially DI solu¬ 
tions for information tasks in the prepare-and-measure 
scenario S3 [29}, it would be relevant to explore the 
possibilities offered by DI entropy tests. 
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Appendix A: A brief review of the entropic approach to 
causal inference and its application in the 
prepare-and-measure scenario 

The entropic approach for classical DAGs consists of 
three steps: (1) List all the Shannon type inequalities re¬ 
spected by a collection of n variables, regardless of any 
underlying causal structure between them. (2) List the 
causal constraints that follow from a given causal struc¬ 
ture. In terms of entropies these are linear constraints. 

(3) Marginalize the set of inequalities to the subspace of 
observable variables. Below we consider each of these 
steps and how they can be generalized to the quantum 
case, where some of the nodes in the DAG may repre¬ 
sent quantum states. For more details see Refs. |22l|~2'3| . 

1. Step 1: Listing the Shannon type inequalities 

To understand these constraints, consider a col¬ 
lection of n discrete random variables X\,..., X„ 
associated to some joint distribution p(X 1, ...,X„). 
Let Xj be the random vector ( Xj) ie j and denote 
by H(T) := H(Xj) its Shannon entropy given by 
H(X) = — Y, x P( x ) l°g9 P( x )- Construct the associated 
entropy vector with 2" real components, given by h — 
(H( 0 ), H(X n ), H(X„_ 1 ), H(X„,X n _!),..., H(X x ,..., X„)), 
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to represent all the collections of entropies for n vari¬ 
ables. Not every vector in ]R 2 will correspond to an 
entropy vector, as for example, entropies are positive 
quantities. The region of real vectors that correspond 
to entropies still lack an explicit description, however, 
an outer approximation to it is known, the so-called 
Shannon cone [[30j. 

The Shannon cone is characterized by two basic 
sets of linear constraints and a normalization con¬ 
straint, the so-called Shannon-type inequalities. The 
first type are the monotonicity inequalities, for ex¬ 
ample, H(X], X 2 ) > H(X 1), stating that the uncer¬ 
tainty about a set of variables should always be larger 
than or equal to the uncertainty about any subset of 
it. The second type of inequalities are given by the 
strong subadditivity condition which is equivalent to 
the positivity of the conditional mutual information. 
For example f(X x : X 2 |X 3 ) = H(X 1; X 3 ) + H(Xi,X 3 ) - 
H(X i,X 2 , X 3 ) — H(X 3 ) > 0 . Finally, the normalization 
constraint imposes H(0) = 0. 

2. Step 2: Listing the causal constraints 

For an illustration, let us consider the DAG associ¬ 
ated with the prepare-and-measure scenario, depicted 
in Fig.JTJb). 

Notice that in this causal structure we do not explic¬ 
itly specify the numbers of different measurements or 
preparations. The causal constraints are encoded in the 
conditional independences implied by the causal struc¬ 
ture. For instance, variables X and Y are not connected 
by arrows from one to the other nor is there a third vari¬ 
able connecting them. Thus, these variables should be 
statistically independent, which can be represented en- 
tropically via a linear relation I(X : Y) = 0 . In general, 
all the causal constraints following from a graph can 
be listed using the d-separation algorithm I21I , but it 
is sufficient to use the so-called Markov decomposition. 
In the case of Fig. [ijb) this states that 

p{x,y,b) = £p(%,m,A)p(m|A,*)p(x)p(y)p(A). 

A ,m 

(Ai) 

Using this decomposition, we can list all the relevant 
causal constraints for the DAG. These are 

H(X,Y,A)=H(X)+H(Y)+H(A), (A2) 

H(M|X,A) = 0 (A3) 

H(B|Y,M,A) = 0 . (A 4 ) 

Notice that, without loss of generality, we imposed that 
H(M|X, A) = 0 and H(B\Y,M, A) = 0 , basically saying 
that these variables are deterministic functions of their 
parents (any additional randomness can be absorbed in 
A). 


3. Step 3: Marginalization 

Given the description of the Shannon cone of n vari¬ 
ables plus the causal constraints, we are interested in 
its projection in the subspace containing only observ¬ 
able terms. This is achieved via a Fourier-Motzkin (FM) 
elimination (JTj. The final set of inequalities obtained 
via the FM elimination (and after eliminating over re¬ 
dundant inequalities) gives all the facets of the Shannon 
cone in the observable subspace. This set of inequalities 
consist of trivial and non-trivial ones. By non-trivial, 
we mean those inequalities that do not follow simply 
from the basic Shannon-type inequalities (monotonic¬ 
ity or strong subadditivity) but require the causal con¬ 
straints to hold. 

To illustrate, consider again the DAG of Fig.jijb). We 
marginalize over the variables that we do not have di¬ 
rect empirical access to, in this case A and M. Flowever, 
we still want to keep the term H(M ) as part of our de¬ 
scription, because this is exactly the term we would like 
to bound from the observations of X, Y and B. Proceed¬ 
ing with the marginalization step we find that the only 
non-trivial inequalities are 

I(X : Y,B) < H(M), (A 5 ) 

I(X : Y) = 0 . (A6) 


4. Deriving entropic witnesses in the 
prepare-and-measure scenario 

We now move on to the DAG in Fig.JiJc). In this case, 
the causal constraints are given by 

H(X V X 2 , A) = H(X 1/ X 2 ) + H( A), (A 7 ) 

H(M|Xi,X 2 ,A) =0 (A8) 

H(B 1 ,B 2 |M,A)= 0 . (A 9 ) 

Notice that we do not impose independence between 
the inputs, that is, f(X 3 : X 2 ) 7^ 0 in general. Perform¬ 
ing FM elimination, we find that the only non-trivial 
inequalities are given by (up to permutations) 

I(Xi,X 2 : Bi) < H(M), (A10) 

I(X 1 :B 1 ) + I(X 2 :B 2 ) (An) 

+I(X 1 : X 2 |B x ) - I(X 1 : X 2 ) < H(M). 

The first inequality is similar to what we have obtained 
above while the second inequality is the entropy wit¬ 
ness described in the main text. 

We notice that the same result holds true if we 
consider a modified DAG where the preparation and 
measurement devices are independent, i.e. where the 
shared variable A is split into independent variables A x , 
A 2 connected to M and to the B's respectively with the 
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new causal constraints 

H(X 1 ,X 2 ,A 1 ,A 2 ) - H(X 1 ,X 2 ) +H(A 1 ) + H(A^12) 
H(M|X 1/ X 2/ A 1 )=0 (Ai 3 ) 

H(B 1 ,B 2 |M,A 2 ) = 0 . (A14) 

This resembles the results discussed in the main text 
(c.f. (fii)), where we have shown that the optimal strat¬ 
egy minimising the entropy for given values of I3, I4, 
and I5 (and, we conjecture, I n in general) does not re¬ 
quire shared correlations between the two devices. 

Following the ideas in [(221 we can prove that inequal¬ 
ity ( |Aii| ) is also valid for a quantum message. The pro¬ 
cedure is similar to the classical case though there are 
a few important differences. Because the message is 
quantum, we have to replace H(M) by the von Neu¬ 
mann entropy S(q). Another difference is that we can¬ 
not assign an entropy to B and p simultaneously. This 
is because for B to assume a determined value we first 
need to a apply a completely positive, trace-preserving 
(CPTP) map on q that in general will disturb 0. There¬ 
fore, when constructing the set of inequalities and con¬ 
straints we need to eliminate all those that contain B 


and q together, for example S(o, B) and S (p,X,B ). A re¬ 
lated problem is that one of the causal constraints valid 
in the classical case, S (B | q, A) = 0 , cannot be defined in 
the quantum case since it involves the term S(p, A, B). 
The idea in [|22d is to replace these causal constraints 
by corresponding data processing inequalities that are 
valid in quantum mechanics. 

Following the approach in Ii22i we now prove that 
( |A 11 ) and its generalization (jq) give valid bounds in 
the quantum case. 

Proof. Rewrite the conditional mutual information ap¬ 
pearing in ([9]) as 

I(Xi : X f |Bf) - J(X f : X^B,-) - t(X, : B ; ). (A15) 

Using this, the LF 1 S of the inequality (Jqj can be rewrit¬ 
ten as 

I(X 1 : BO + £ I(X f : X lf B,) - £ S(X ; ) + S(X lr .. . ,X n ), 

i =2 i=1 

(Al6) 

This last expression can be upper bounded by 


< J(Xr : A, 5 ) +SL 2 f(X, : X lf A,q) - E?=i S(X f ) + S(X a .X„) (A17) 

- S(A, q) + (n- 2 )S(X!,A, C ) -ET= 2 S(X 1 ,X i -,A,5)+S(X 1 . X n ) (A18) 

< S(A,p)-S(X 1 ,...,X„,A,p) + S(X 1 ,...,X n ) (A19) 

< S(A,p)-S(X 1 ,...,X„,A) + S(X 1 ,...,X„) (A20) 

— S(A, q) — S(A) (A21) 

< S(q) (A22) 


which exactly gives as desired. In the above we have 
used: (i) data processing inequalities f(X,- : B ; ) < f(X/ : 
A, q) and f(X, : X^B,) < f(X, : Xi,A,p), (ii) the re¬ 
lation - E” =2 S (X 1; X ir A, q) < -S(X 1 . X k ,A,q) — 

(n — 2 )S(Xi,A,q), (iii) the monotonicity inequality 
S(^|Xi,... ,X„, A) > 0 , (iv) the independence relation 
I(X X n : A) = 0 , and (v) S(q|A) < S(q). No¬ 
tice that we have used the von Neumann entropy S 
for all terms. For the terms where all variables are 
purely classical (i.e. that do not involve q), the von 
Neumann and Shannon entropies coincide, for exam¬ 
ple S(Xi) = H{Xi). □ 


Appendix B: Maximal violation of I n implies maximal 
entropy 

As mentioned in the main text, inequality ([9) can be 
used to prove that a maximal violation of the dimension 
witness I n , which implies message dimension d — n, 
also implies maximal entropy, i.e. S(q) > logn. We 


are interested in a scenario with n preparations and 1 — 
n — 1 measurements, with respective probabilities given 
by p(x) — 1 /n and p(y) — l/l . Notice however, that 
in the construction of the inequality 1(9) we have 1 — 
n — 1 explicit variables X,. To encode the probability 
p(x) — 1 /n we consider each of the X, to be dichotomic 
variables and assign a joint probability distribution to 
them given by 

( l , Xi= 0 Vi 

p(x 1 ,...,x„_ 1 ) = f | , Xi = 1, Xj^i = 0 Vi . (Bi) 

\ 0 , otherwise 

For example, in the case with 3 preparations and 2 mea¬ 
surements we have p(0,0) — p( 0 , 1 ) — p(l,0) = 1/3 
and p(l, 1 ) = 0. Using the distribution p(b\x,y) which 
achieves the maximum of l n , and by direct calculation 
of the LHS of 1(9), we then find S(p) > log n. That is, to 
achieve the maximal violation of the dimension witness 
I n , one needs maximal entropy, regardless of whether 
classical or quantum systems are used. 
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Appendix C: Message dimension n is sufficient 

In this section we prove that messages of dimension 
at most n is required when minimising the entropy, 
where n is the number of inputs for the preparation 
device. We will use the following terminology. A de¬ 
terministic point is an extremal point of the polytope in 
which the observed data p(b\xy) lives. A deterministic 
strategy is a recipe assigning deterministically a mes¬ 
sage to each given input for the preparation device and 
an output to each given message and input for the mea¬ 
surement device. Deterministic strategies are labelled 
by A. The data can be decomposed as 

p(b\xy) — F. p(b\myA)p(m\xA)p(A) (Ci) 

A ,m 

YL^b,f A {m,y)Ki,g x (x) ) PW (C2) 

m J 

= E A b*vAPW- ( 03 ) 

A 

Here f\, g\ are the deterministic functions specified by 
the strategy A. The quantity A bx y r x gives the determin¬ 
istic point resulting from the strategy A. In general 
there may be different deterministic strategies which 
result in the same deterministic point, i.e. one can have 
Abxy,\ = A bxyA , for different A, A'. 

The probability for a certain message m to occur is 
given by 


E p(m\xA)p(x)p{ A) 

A,x 

(C 4 ) 

E (E<W(*)pM J pW 

(C 5 ) 

A \ x ' J 


E B m x p(A), 

(C6) 


A 


where B m \ is the probability for m given the strategy 
A averaged over the input distribution. Using this, the 
entropy of the message is 

= p(m)log(p(m)) ( c 7) 

m 

= -E ^E B «AP( A )j log (e*W(a)) • 

(C8) 

We are interested in what dimension is required for the 
message to achieve the minimum of this quantity, com¬ 
patible with given observed data p(b\xy). 

We note that for fixed A, the deterministic function 
gx giving the message m as a function of x is fixed. 
Since there are n inputs to the function, there can be at 
most n different outputs. Therefore in takes at most 11 
different values. Thus, for each deterministic strategy at 


most n different values of in occur. If all deterministic 
strategies make use of the same labels, then the total 
message dimension is at most n. However, it could in 
principle be that different strategies use different labels, 
such that the total message dimension is larger than n. 
This is not advantageous in terms of minimising the 
entropy though, as we now show. 

For simplicity, consider just two deterministic strate¬ 
gies Ao and Aq. Denote the values of m used in strategy 
Ao by // ],..., p n and let us assume that for strategy Aq 
some of the labels are the same while some are differ¬ 
ent, e.g. pi,..., }ij, p'j +v Using jC6) and ((C7J we 

see that labels with i < j will give rise to contributions 
to the entropy of the form 

( B W,Ao + B yiA' 0 ) lo § ( B mAo + ' ( C 9) 

while the contributions from labels p u p!- with i > j will 
be 

B i i iAo lo § ( b wa<>) + b /<;,a' 1o § ( b ,<;,a' ) • (C10) 

Now, for any two positive numbers a, b with a + b < 1 
one has that (a + b) log(fl + b) < nlog(a) + blog(fr). 
It follows that, when minimizing the entropy, it is al¬ 
ways advantageous to use the same set of labels in 
both strategies Ao and Ag. In general, one should use 
the same set of message labels in all the deterministic 
strategies needed to reproduce the data p(b\xy), and 
hence a message of dimension at most n is needed. 

We note that, using concavity of the entropy, it is also 
possible to prove that there is no advantage in using 
several deterministic strategies for the same determin¬ 
istic point. I.e. it is optimal to take a single deterministic 
strategy for each deterministic point. 

Appendix D: Minimization of H(m) as a linear program 

As discussed in the main text, the minimum value 
of the entropy compatible with observed data p can be 
expressed as the following optimization problem (see 
also fqSj for a statement of this problem in the context 
of Bell inequalities): 

mintt(M) s.t. Aq = p,q,\ > 0 and E 1 a — 1 - (Di) 

A 

Given the concavity property of the entropy function, 
the minimum of H(M) will be obtained at one of the 
vertices of the polytope defined by the linear constraints 
of the optimization problem ED' which we denote Q. 
However, the characterization of Q can be quite de¬ 
manding computationally which leads us to introduce 
a simplified approach. 

Notice that for evaluating min H(M) we only need 
to consider the probability distribution p(m) — 
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p(m — 1) 

FIG. 3. In dashed red we see the polytopal region 1/6 < 
p(m) < 1/2 and p( m) = 1 . This is an outter approxi¬ 
mation to the true polytopal region defined by the constraint 
I3 = 4 . The actual polytope can be found by solving a se¬ 
quence of LPs and is shown in solid black. 


Ea.y p(wt|A,x)p(A)p(x) which, for a fixed value of p{x), 
is therefore a linear function of the underlying hidden 
variable A (represented in via the vector q). The 
linear constraints in will also imply linear con¬ 
straints on p(m). That is, the observable data defines 
a polytope P characterizing the probability p(m) that 
is compatible with it. Therefore, to compute H(M) we 
only need to consider the extremal points of P. This sig¬ 
nificantly reduces the computational complexity of the 
problem and has allowed to us to consider the prepare- 
and-measure scenario with up to n — 5 preparations 
and l = 4 measurements. 

To illustrate the general method for characterizing 
P, in the following we will consider in details and 
without loss of generality the scenario x E {0,1,2}, 
m E { 0 , 1 , 2 }, b E { 0 , 1 } with all preparations equally 
likely, that is p(x) — 1 / 3 . Given the data p (or a linear 
function V (p) of it) the minimum and maximum val¬ 
ues of p(m) compatible with it can be found via ( 7 ) 7 ), 
where we simply replace the objective function H(M) 
by p(m). 

For example, if we impose the constraint I3 — 4 we 
find that l/6<p(m = 0 )<l/ 2 . By symmetry the 
same holds true for p(m — 1) and p(m — 2) (since we 
are optimizing over all classical strategies, the labels we 
assign to m are irrelevant). That is, under the constraint 
1^—4 the minimum of H(M) is restricted to be in the 
polytopal region defined by 1/6 < p(m) < 1 / 2 . Notice 
that by normalization we can write the entropy H(m) 



FIG. 4. Representation of the polytope P that can be found 
by solving a sequence of 4 linear programs. The facets shown 
in figure correspond to lj —> p(m = 0) + p{m = 1) > p m ;„ + 

p'min and ; 2 -f p{m = 0) + p(m = 1) < p max + p' max . 


as function of p(m — 0) and p(m — 1) alone, implying 
a 2 -dimensional polytopal region. The result is shown 
in Fig. [3] Also notice that the actual polytopal region 
implied by I3 — 4 is smaller than (but contained) in 
1/6 < p(m) < 1 / 2 . The reason is that further con¬ 
straints, for example p(m — 0) = 1/2, will imply new 
constraints over p(m = 1 ), e.g. 1/6 < p(m — 1 ) < 1 / 3 . 
That is, this first polytope defines an outer approxima¬ 
tion to the true polytope, and therefore provides only 
a lower bound (typically non-tight) on H{m). The ac¬ 
tual polytope P can be found by running a sequence of 
linear programs (LPs) as we explain next. 

First one needs to run two LPs to find the bounds 
Pmin < p(m) < p max ■ Second, we need to find the 
maximum value p' max of p(m = 1) under the constraint 
that p(m — 0) — p mnx and the minimum value p' min 
of p(m — 1 ) under the constraint p(m — 0) — p m i n - 
By symmetry, the value of p' max and p' mjn will be same 
if we reverse the roles of p(m — 0) and p(m — 1 ). 
From the fact that p nmx + p' min + p mln < 1 and p max > 
p(m — 2) — 1 — p(m — 0) — p(m — 1 ) it follows 
that p(m = 0 ) + p{m = 1 ) > p min + p' min . Simi¬ 
larly, from p max + p' max + p min — 1 and p(m — 2) > 
Pmin —> 1 — p(m — 0 ) — p(m — 1 ) > p m i n it follows that 
p(m — 0 ) + p(m — 1 ) < p max + p' max . An illustration 
of this construction is shown in Fig. |q] and can be easily 
extended to higher dimensions. 

In the main text we have used this procedure to com¬ 
pute min H(m) given values of the dimension witnesses 
I3 and 74, obtaining the general relation ( 77 ) that we 
conjecture to be true for any I n . Furthermore, the rela- 
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tion ( fii) can be seen to hold for other classes of dimen¬ 
sion witnesses. To illustrate this point, we consider the 
following inequality in the scenario with n — 4 prepa¬ 
rations and l — 2 measurements lE8l 


i?4 — Ell + El2 + ^21 — £22 — £31 + £32 — £41 — £42 < £rf • 

(D2) 

where, L^ — 2 d for d > 2 and = 0 for d — 1 . The 
quantity R4 quantifies the score in a 2 —t 1 random ac¬ 
cess code (RAC) game. In a RAC game one party (cor¬ 
responding to our preparation device) receives a string 
of bits and then transmits a message to a second party 
(corresponding to our measurement device). Given the 
message and an index labelling one of the input bits, the 
second party must produce a binary outcome equal to 
that bit. R4 corresponds to the case where the prepara¬ 
tion device receives 2 bits and the measurement device 
receives 1 bit and produces a binary outcome. 

A crucial difference between R4 and the class I n re¬ 
sides on the fact that for I n < n(n — 3)/2 + 1 , 1 - 
dimensional messages (and therefore with zero en¬ 
tropy) are enough to reproduce the data. In contrast, 
for any R4 yf 0 we need at least 2-dimensional systems. 
We have followed the same steps as for I n and obtained 
the classical curve in Fig. [5] This result is perfectly fit¬ 
ted by the same expression ( 77 ) as for the I„ class in the 
region 4 <£f < 8. However, it fails for 4 . As dis¬ 
cussed above this is exactly the region where R4 and the 
I„ class display a very different qualitative behaviour. 
Therefore such a difference for d — 2 should come 
as no surprise. In the region $ < 4 , the minimum 
entropy is described by minH(M) = Hbj n ((l/16)R4), 
where — —xlog 2 x — (1 — x) log 2 (l — x) stands 

for the binary entropy. 

As for 13, I4 we have also performed a numerical op¬ 
timisation for quantum strategies, as explained below. 
The results are shown in Fig. [5] We see that, as for f 3 , 
I4, quantum strategies allow a significant reduction in 
entropy, i.e. in average communication. The optimisa¬ 
tion for quantum strategies was performed for qubits, 
qutrits, and for real ququarts. Interestingly, unlike for 
I3, I4, our results indicate that complex phases are nec¬ 
essary to reach the optimum for the RAC. The real 
ququart curve does not recover the results for qubits 
and qutrits in the region achievable with qubits (note 
though that the numerics are not completely stable in 
this region as may be seen from the plot). This suggests 
that real quqarts may also not be optimal above this re¬ 
gion. At the same time, while in the qubit region we 
find no advantage for qutrits, we do find an advantage 
of ququarts over qutrits above the qubit region. 



FIG. 5. Minimum values of the entropies H(M), S(q) for clas¬ 
sical (dotted) and quantum (solid, dashed) strategies compat¬ 
ible with a given value of R4. Solid and open circles indicate 
the points of maximal witness value achievable with the indi¬ 
cated dimension for classical and quantum strategies respec¬ 
tively. The quantum curves show the optimisation for qubits 
and qutrits (solid) and for real quqarts (dashed). 

Appendix E: Upper bounding the maximum dimension 

As discussed in the main text, for all the cases con¬ 
sidered we observe that if given data can be reproduced 
with a classical message of dimension d, the minimum 
entropy H(M) is also achieved with this dimension. In 
the following we give a geometric explanation for this 
effect. We consider without loss of generality the case 
where the data can be reproduced with d — 2 and show 
that allowing for d = 3 cannot lead to a smaller H(M). 

In the d — 2 case, because of the normalization con¬ 
straint p(m — 0 ) + p(m — 1) = 1 the polytope Pi can 
be represented as a 1-dimensional object simply given 
by Pmin < p(m = 0 ) < p max (see Fig. (6fa)). By con¬ 
cavity it follows that the minimum entropy over this 
set is minH(M) = min [H bin (p min ), H hin {p max )]. Con¬ 
sider now that we allow for d — 3 leading to a 2 - 
dimensional polytope P2 that we parametrize as a func¬ 
tion of p(m — 0 ) and p(m = 2 ). To show that this extra 
dimension cannot improve H(M) it is sufficient to give 
an (in principle) outer approximation of P2 and show 
that H(M) on all the extremal points of this set is larger 
than or equal to minH(M). 

The polytope P2 is characterized by the following 


constraints (see Fig. |6jb)) 

Ci : 0 < p(m — 2 ), (Ei) 

C2 : 0 < p(m — 0 ), (E2) 

C 3 : p(m = 0 ) + p(m = 2 ) > p mini (E3) 

C 4 : p(m = 0) < Pmax, (E4) 

C 5 : p(m = 0 ) + p(m = 2 ) < p max + p min , (E5) 

C 6 : p(m = 2 ) < p max . (E6) 
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a) 




Pmax 

Vmin 


Pi 


P-2 



Constraints Ci, C2, C5 and C& trivially follow; us¬ 
ing p max + Pmin — 1 we can easily prove C4 and C5 
from 1 — p(m — 0) — p(m — 2 ) < and p(m 

1); + p(m —2)<1, respectively. Therefore, polytope P2 
is characterized by six extremal points, two of which 
are also extremal points of P4. Defining Hid, / 3 ) = 
—a log a — j6 log /5 — (1 — a — / 3 )log(l — a — / 3 ), the en¬ 
tropy of the extremal points P\ to Pg are given, respec¬ 
tively, by Hi = H(p max , 0 ), H 2 = H(p min , 0 ), H 3 = 
i i ( 0 , Pniin ), H -4 H( 0 , Pmax)/ H(pmax/ Pmin) a ^Ld 

H 6 = H(p min/ p max ). It follows that H x = H 3 , H 2 = H 4 
and H5 = Hg. Therefore, to prove that this extra di¬ 
mension cannot improve H(M) we only have to prove 
that H5 > H x and H5 > H2. The inequality H5 > H x is 
equivalent to 

Pmin log Pmm (1 Pmin Pmax ) log(l - Pram Pmax) 
P “(1 — Pmax) log(l — Pmax ) (e 7 ) 

that is trivially true since p maT + p mln = 1 . Similarly 
one can prove that H5 > H 2 which concludes the proof. 


Appendix F: Bounding the entropy for quantum strategies 

In Fig. [2] of the main text we show curves for the 
quantum entropy compatible with given values of the 
witnesses f 3 , I4. These curves are obtained by numer¬ 
ical optimisation and should be understood as upper 
bounds on the minimal quantum entropy. 

Specifically, the curves are obtained by maximising 
the witness value while restricting the entropy S(p) < s 
and then increasing s from zero until the maximal wit¬ 
ness value is reached. A priori one might expect that 
the optimisation should be performed over both prepa¬ 
rations and measurements. However, it is only neces¬ 
sary to optimise over states because of the following ob¬ 
servation. For a given choice of preparations Q\,... ,Q n 
and measurements Mi,...,M„_i the expected quan¬ 


tum value of the witness I n is given by (c.f. <0) 

ll = X)Tr[ C iM y ] + E ” if v^TrkcMy] 

y=l x=2 y=1 

n -1 "+l-i/ 

= E Tr [(<h + E v xyQx)My\ 

y =1 *= 2 

= E Tr fcyMy], (Fi) 

y=i 

with 


n+l-y 

?y = <?i+ E Dj/Px- (F2) 

x=2 

The observables M v are binary, so they are hermitian 
operators with eigenvalues ± 1 . Since the states q x are 
hermetian so are the sums of them p[ r The maximal 
value of I„ is then attained by choosing My to be diag¬ 
onal in the same basis as Oy with eigenvalues ±1 on the 
subspaces where has positive and negative eigenval¬ 
ues respectively. The maximum is thus equal to 

4 = EEM' ( f 3 ) 

y=i k 

where A are the eigenvalues of q[j. To obtain the 
curves in Fig. [2]we pick a dimension, e.g. qubits, qutrits, 
or ququarts, we parametrise the states Q x , and we nu¬ 
merically maximise (JF3]) subject to S(q) < s, where 
q — Y), Qx / n is the average state assuming uniform in¬ 
puts. The optimisation is implemented using NMaxi- 
mize in Mathematica. 

For the witness I3 we have performed the optimisa¬ 
tion using fully parametrised qubits and qutrits, and 
using real ququarts (i.e. paramtrisation without com¬ 
plex phases). We find that, for the range of values of 
I3 which can be achieved by qubits, neither qutrits nor 
ququarts provide any advantage in terms of lowering 
the entropy (and in fact real qubits and real qutrits are 
sufficient). 

For I4 we have performed the optimisation for fully 
parametrised qubits, and for real qutrits and ququarts. 
Again we find that in the range achievable by qutrits, 
ququarts provide no advantage and in most of the 
range achievable by qubits, qutrits and ququarts pro¬ 
vide no advantage. As before, real qubits perform the 
same as when phases are included, indicating that this 
may also be true for higher dimensions. We do, how¬ 
ever, observe a small advantage of qutrits and ququarts 
in a narrow part of the qubit region, from I4 « 5.52 
to I4 — 6. The minimal entropy achieved by qubits in 
this region is a few percent larger than for qutrits and 
ququarts according to our numerical results. We believe 
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this is due to suboptimal performance of the optimisa- tion algorithm, although we have found no better point 

despite extensive testing. 



